Hi everybody, I don’t know if this is the right c...
# prefect-integrations
g
Hi everybody, I don’t know if this is the right channel to ask these Kubernetes-related questions. Correct me if I am wrong. 1. I would like to build my own docker image based on one of the prefect images with a pip command installing some libraries like prefect-dbt on top it. Could you share anexample Dockerfile with correct ENTRYPOINT etc. with me, please? 2. Is it possible to feed the namespace parameter in which the pods will be created via the values.yaml file in the helm chart while deploying the prefect-worker in the Kubernetes cluster?
m
Hi for the second point, you will need to work on the
job_template.json
wich defines the configuration for the jobs https://github.com/PrefectHQ/prefect-helm/tree/1904f18b2155f85a70141e27e289150bec245431/charts/prefect-worker#configuring-a-bas[…]late-on-the-worker
thank you 1
Can define it like this in values.yaml
g
This exports a file with parameters. Will I hardcode my new parameter directly in this json file?
m
Yes, you specify the default values
prefect rocket 1
g
Any updates regarding the first questions 🙂 ? 1. I would like to build my own docker image based on one of the prefect images with a pip command installing some libraries like prefect-dbt on top it. Could you share anexample Dockerfile with correct ENTRYPOINT etc. with me, please?
I could modify the namespace, thank you so much 🙂 How should I define the env variables here?
Copy code
{
  "EXTRA_PIP_PACKAGES": "prefect-dbt prefect-github prefect-gcp prefect-sqlalchemy dbt-core>=1.8.0,<1.9.0 dbt-postgres>=1.8.0,<1.9.0 dbt-bigquery>=1.8.0,<1.9.0",
  "PREFECT_LOGGING_LEVEL": "DEBUG"
}
here is how I configure it via the UI
This worked for me!! 🙂
n
1. what is the dockerfile for? a. runtime pod, you don't need to alter the entrypoint b. worker pod, again you generally shouldn't need to when using the helm chart 2.
namespace
is a field on the work pool, so it doesn't need to be configured when you provision the infra, but you can pass in defaults via baseJobTemplate as @Mehdi kindly mentioned
g
Hi Nate, thank you for your response. It install the additional dependencies each time a task runs. I want to build the docker image to make the pipeline more performant.
Screenshot 2024-11-22 at 17.05.34.png
if I remove the last two pieces, would it work?
ENTRYPOINT and CMD?
n
yes it should, if you check the link I sent above. the chart should handle invoking the command for you
g
I am still getting the same error:
Copy code
➜  prefect-docorbit git:(main) ✗ kubectl logs -n prefect notorious-bug-bpw6x-lkg6m
exec /usr/bin/tini: exec format error
This didn’t work either 🙂
I see that at the base prefect image the 32nd statement creates an entrypoint
I added 35 to mimic that but didn’t work
n
hi @Göktuğ Aşcı - have you looked at the example I linked above?
g
Is it this link? https://github.com/zzstoatzz/prefect-pack/tree/main/examples/run_a_prefect_worker/on_k8s But this is a default example, isn’t it? I can run the pod with the following env variable attached to it perfectly:
Copy code
{
  "EXTRA_PIP_PACKAGES": "prefect-dbt prefect-github prefect-gcp prefect-sqlalchemy dbt-core>=1.8.0,<1.9.0 dbt-postgres>=1.8.0,<1.9.0 dbt-bigquery>=1.8.0,<1.9.0",
  "PREFECT_LOGGING_LEVEL": "DEBUG"
}
I get the issue when I want to customise my own docker image
m
I doubt that adding pip packages to the worker, would pass these packages to the jobs
g
Have you shared another link related to that?
n
it wont, but this goes back to my original question you don't need the worker pod to have runtime deps
you define an
image
on the work pool, which flow run pods will use, which is entirely independent of the worker pod
m
@Göktuğ Aşcı what u want to do, is create a custom base image for your jobs. You can define which image to use it in the same json (job_template)
g
Oh sorry, I missed that you posed a question there. Runtime pod should pull an image that has the following packages installed on top the default docker image
Copy code
prefect-dbt prefect-github prefect-gcp prefect-sqlalchemy dbt-core>=1.8.0,<1.9.0 dbt-postgres>=1.8.0,<1.9.0 dbt-bigquery>=1.8.0,<1.9.0
And it should do it just once from my personal dockerhub repo
n
@Göktuğ Aşcı you might find

this

useful
g
oh this video is with docker pool I guess. I am running this with the kubernetes pool
n
yeah a lot of the ideas are the same the main point thats relevant here is: you do not need to define your runtime python dependencies when you deploy the worker the worker pod just spins up pods for flow runs. your deployments can override the image to define what deps that deployment needs, or you can even override the
image
on a per flow run basis
g
oh my lack of understanding of some Docker and Kubernetes concepts may be the cause of our miscommunication here 🙂
When I pass this parameter: { “EXTRA_PIP_PACKAGES”: “prefect-dbt prefect-github prefect-gcp prefect-sqlalchemy dbt-core>=1.8.0,1.9.0 dbt-postgres=1.8.0,1.9.0 dbt-bigquery=1.8.0,<1.9.0", “PREFECT_LOGGING_LEVEL”: “DEBUG” } it installs these dependencies during the runtime but that is what I want to avoid since it installs the dependencies over and over again each time the runner pod is created. I want to create another base image with these dependencies that would be used by the worker and runner pods to avoid this phase.
(or just by the runner pod)
n
ok so there's 2 types of pods here • the worker ◦ doesnt need
prefect-dbt
or any other ones you listed, bc these are dependencies required by your flow run. all your worker needs is
'prefect[kubernetes]'
because all it does is communicate with the kubernetes API to create pods where your flow runs happen ◦ start your worker like this • the flow runs pods that are created by the worker ◦ these pods will actually need
prefect-dbt
and your other runtime python dependencies, bc this is where your code will happen. ◦ you do not configure these dependencies by configuring the worker, these dependencies are configured by setting an
image
on your kubernetes work pool in the UI or via job variable overrides setting an
image
on your kubernetes work pool is how you can
avoid this phase
g
Thanks. I am stuck at creating the image to set for the runner pod, then.
I don’t need the base image either right
I can simplify use a simple python base too installing prefect etc. on it, then
m
Yes, that's correct. @Göktuğ Aşcı
g
It was an architecture problem. Thanks to @Mehdi we resolved that but I have another problem to resolve now 😄 but it looks simpler than the previous.
1. Thank you @Nate for making me realize that worker and runner pods use different images. 2. Thank you @Mehdi for suspecting that the images that are built by my M1 Mac may be incompatible with the cluster that I pay for. 🙂 amd64 vs arm64 After making sure that I correct the mistakes above, I could make everything work. The pipelines are much faster now since they don’t have to install dependencies from scratch.
catjam 1
🚀 1